Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from Markov chains.

نویسندگان

  • Yann Guédon
  • Yves d'Aubenton-Carafa
  • Claude Thermes
چکیده

The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of first-order Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes

Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded  DNA virus. There were two approaches for prediction of each Markov Model parameter,...

متن کامل

Probabilistic Sufficiency and Algorithmic Sufficiency from the point of view of Information Theory

‎Given the importance of Markov chains in information theory‎, ‎the definition of conditional probability for these random processes can also be defined in terms of mutual information‎. ‎In this paper‎, ‎the relationship between the concept of sufficiency and Markov chains from the perspective of information theory and the relationship between probabilistic sufficiency and algorithmic sufficien...

متن کامل

Distribution of First Passage Times for Lumped States in Markov Chains

First passage time in Markov chains is defined as the first time that a chain passes a specified state or lumped states. This state or lumped states may indicate first passage time of an interesting, rare and amazing event. In this study, obtaining distribution of the first passage time relating to lumped states which are constructed by gathering the states through lumping method for a irreduci...

متن کامل

Computational Biology Lecture 9: CpG islands, Markov Chains, Hidden Markov Models HMMs

Given a DNA or an amino acid sequence, biologists would like to know what the sequence represents. For instance, is a particular DNA sequence a gene or not? Another example would be to identify which family of proteins a given protein (amino acid sequence) belongs to. In both cases above, we have a sequence of symbols from some alphabet and we are required to say something about the structure o...

متن کامل

Computational Investigation on Structural Properties of Carbon Nanotube Binding to Nucleotides According to the QM Methods

The interaction between nucleotides and carbon nanotubes (CNTs) is a subjectof many investigations for treating diseases but there are many questions in this field thatremain unanswered. Because of experimental methods involve assumptions andinterpretation besides limitations, there are many problems that the best study for them isusing theoretical study. Consequently, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of mathematical biology

دوره 52 3  شماره 

صفحات  -

تاریخ انتشار 2006